智能论文笔记

Deep Multimodal Guidance for Medical Image Classification

Mayur Mallya , Ghassan Hamarneh

分类：计算机视觉

2022-03-10

医学成像是现代医学治疗和诊断的基石。但是，对于特定静脉局体任务的成像方式的选择通常涉及使用特定模式的可行性（例如，短期等待时间，低成本，快速获取，辐射/侵入性降低）与临床上的预期性能之间的权衡。任务（例如，诊断准确性，治疗计划的功效和指导）。在这项工作中，我们旨在运用从较不可行但表现更好（优越）模式中学到的知识，以指导利用更可行但表现不佳（劣等）模式，并将其转向提高性能。我们专注于深度学习用于基于图像的诊断。我们开发了一个轻量级的指导模型，该模型在训练仅消耗劣质模式的模型时利用从优越方式中学到的潜在表示。我们在两种临床应用中检查了我们方法的优势：从临床和皮肤镜图像中的多任务皮肤病变分类以及来自多序列磁共振成像（MRI）和组织病理学图像的脑肿瘤分类。对于这两种情况，我们在不需要出色的模态的情况下显示出劣质模式的诊断性能。此外，在脑肿瘤分类的情况下，我们的方法的表现优于在上级模态上训练的模型，同时产生与推理过程中使用两种模态的模型相当的结果。

translated by 谷歌翻译

Do I have the Knowledge to Answer? Investigating Answerability of Knowledge Base Questions

Mayur Patidar , Avinash Singh , Prayushi Faldu , Lovekesh Vig , Indrajit Bhattacharya , Mausam

分类：自然语言处理 | 人工智能

2022-12-20

When answering natural language questions over knowledge bases (KBs), incompleteness in the KB can naturally lead to many questions being unanswerable. While answerability has been explored in other QA settings, it has not been studied for QA over knowledge bases (KBQA). We first identify various forms of KB incompleteness that can result in a question being unanswerable. We then propose GrailQAbility, a new benchmark dataset, which systematically modifies GrailQA (a popular KBQA dataset) to represent all these incompleteness issues. Testing two state-of-the-art KBQA models (trained on original GrailQA as well as our GrailQAbility), we find that both models struggle to detect unanswerable questions, or sometimes detect them for the wrong reasons. Consequently, both models suffer significant loss in performance, underscoring the need for further research in making KBQA systems robust to unanswerability.

translated by 谷歌翻译

Predicting Citi Bike Demand Evolution Using Dynamic Graphs

Alexander Saff , Mayur Bhandary , Siddharth Srivastava

分类：机器学习

2022-12-18

Bike sharing systems often suffer from poor capacity management as a result of variable demand. These bike sharing systems would benefit from models to predict demand in order to moderate the number of bikes stored at each station. In this paper, we attempt to apply a graph neural network model to predict bike demand in the New York City, Citi Bike dataset.

translated by 谷歌翻译

Double U-Net for Super-Resolution and Segmentation of Live Cell Images

Mayur Bhandary , J. Patricio Reyes , Eylul Ertay , Aman Panda

分类：计算机视觉

2022-12-05

Accurate segmentation of live cell images has broad applications in clinical and research contexts. Deep learning methods have been able to perform cell segmentations with high accuracy; however developing machine learning models to do this requires access to high fidelity images of live cells. This is often not available due to resource constraints like limited accessibility to high performance microscopes or due to the nature of the studied organisms. Segmentation on low resolution images of live cells is a difficult task. This paper proposes a method to perform live cell segmentation with low resolution images by performing super-resolution as a pre-processing step in the segmentation pipeline.

translated by 谷歌翻译

SPACE: Speech-driven Portrait Animation with Controllable Expression

Siddharth Gururani , Arun Mallya , Ting-Chun Wang , Rafael Valle , Ming-Yu Liu

分类：计算机视觉

2022-11-17

Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons. The project website is available at https://deepimagination.cc/SPACE/

translated by 谷歌翻译

UMFuse: Unified Multi View Fusion for Human Editing applications

Rishabh Jain , Mayur Hemani , Duygu Ceylan , Krishna Kumar Singh , Jingwan Lu , Mausooom Sarkar , Balaji Krishnamurthy

分类：计算机视觉 | 人工智能

2022-11-17

The vision community has explored numerous pose guided human editing methods due to their extensive practical applications. Most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. However, the problem is ill-defined in cases when the target pose is significantly different from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse the knowledge from multiple viewpoints, we design a selector network that takes the pose keypoints and texture from images and generates an interpretable per-pixel selection map. After that, the encodings from a separate network (trained on a single image human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on 2 newly proposed tasks - Multi-view human reposing, and Mix-and-match human image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a much better alternative.

translated by 谷歌翻译

AdaViT: Adaptive Tokens for Efficient Vision Transformer

Hongxu Yin , Arash Vahdat , Jose Alvarez , Arun Mallya , Jan Kautz , Pavlo Molchanov

分类：计算机视觉 | 机器学习

2021-12-14

我们介绍了ADAVIT，一种可自适应地调整视觉变压器（VIT）推理成本的方法，用于不同复杂性的图像。 Adavit通过自动减少在网络中处理的视觉变压器中的令牌数量作为推理进行的令牌的数量来实现这一目标。我们为此任务进行重新格式化自适应计算时间（ACT），扩展为丢弃冗余空间令牌。视觉变换器的吸引力架构属性使我们的自适应令牌减少机制能够加速推理而不修改网络架构或推理硬件。我们展示了ADAVIT不需要额外的参数或子网来停止，因为我们基于自适应停止在原始网络参数上的学习。我们进一步引入了与现有行为方法相比稳定培训的分布先前正则化。在图像分类任务（ImageNet1K）上，我们表明我们提出的Adavit在过滤信息丰富的空间特征和削减整体计算上产生了高效率。所提出的方法将Deit-Tiny的吞吐量提高了62％并除去了38％，只有0.3％的精度下降，优于大边距。

translated by 谷歌翻译

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

Xun Huang , Arun Mallya , Ting-Chun Wang , Ming-Yu Liu

分类：计算机视觉

2021-12-09

现有条件图像综合框架基于单个模态中的用户输入生成图像，例如文本，分段，草图或样式参考。它们通常无法在可用时利用多式联运用户输入，这降低了它们的实用性。为了解决这一限制，我们提出了专家级别的生成对抗性网络（PoE-GaN）框架，其可以合成在多个输入模态或其任何子集上调节的图像，即使是空集。 Poe-GaN包括专家级别的发电机和多模式多尺寸投影鉴别器。通过我们精心设计的培训计划，Poe-GaN学习以高质量和多样性合成图像。除了在多模式条件图像合成中推进现有技术之外，PoE-GaN还优于在单向设置中测试时最佳现有的单峰条件图像合成方法。该项目网站在https://deepimagination.github.io/poe-gan提供。

translated by 谷歌翻译

Hybrid Feedback for Autonomous Navigation in Environments with Arbitrary Convex Obstacles

Mayur Sawant , Soulaimane Berkane , Ilia Polusin , Abdelhamid Tayebi

分类：机器人

2021-11-17

我们开发了一种自主导航算法，用于在二维环境中运行的机器人杂乱，其具有任意凸形的障碍物。所提出的导航方法依赖于混合反馈，以保证机器人对预定目标位置的全局渐近稳定，同时确保无障碍工作空间的前向不变性。主要思想在于基于机器人相对于最近障碍的接近设计，在移动到目标模式和障碍物避免模式之间设计适当的切换策略。当机器人初始化远离障碍物的边界时，所提出的混合控制器产生连续速度输入轨迹。最后，我们为所提出的混合控制器的基于传感器的实现提供了一种算法过程，并通过一些仿真结果验证其有效性。

translated by 谷歌翻译

Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

Hongxu Yin , Pavlo Molchanov , Zhizhong Li , Jose M. Alvarez , Arun Mallya , Derek Hoiem , Niraj K. Jha , Jan Kautz

分类：

2019-12-18

translated by 谷歌翻译